Batuhan Tunçel's HW2

Assuming you have the data folder in your working directory in the following format: 'working_directory/dataset_name/'

We will work with long format for easier visualization and analysis first add id variable (data.table notation) and rename column name "V1" with "class" sort based on class first

  1. Melt the data for long format
  2. Need to get numerical part of the variable (represents time) using gsub to set the nonnumerical part to zero length

Thanks to Genlasso package, build the time series representation with penalized regression approaches
Collect all best estimate for each id in the long_train_lasso
Plot the estimation value with real value

Getting Squared error for each estimation
Compute the mean squared error for each id
Boxplot of mean squared errors of lasso

This time build a regression tree with cp value =0 for each id
Predict each value
Collect them in the tree_long_train

Getting Squared error for each estimation
Compute the mean squared error for each id
Boxplot of mean squared errors of regression tree
The median of MSE of lasso is smaller than the median of MSE of regression tree, however there is not much difference between them.
According to IQR, also, lasso gives better results.

Compute the Mean euclidean distance of representation, it can be said that lasso gives better results.